Text-independent speaker verification based on broad phonetic segmentation of speech

نویسندگان

  • Sunil K. Gupta
  • Michael Savic
چکیده

Speaker verification involves the determination of whether or not a test utterance belongs to a specific reference speaker. The utterance is either accepted as belonging to the reference speaker or rejected as belonging to an imposter. Speaker verification has great potential for security applications, such as physical access control, computer data access control, and automatic telephone transaction control. The main components of a general speaker verification system are shown in Fig. 1. A speaker verification task consists of two phases. In the training phase, reference templates are created for particular speakers using the signal processor shown in Fig. la. During the uerification phase (Fig. lb), the identity claimed by a speaker is verified using a test utterance from that speaker. The inputs to the system consist of a test utterance (sampled at 10 kHz in our experiments) and the claimed identity of the reference speaker. The signal processor can be further subdivided into three steps: normalization, parameterization, and feature extraction. These steps involve preprocessing and information reduction (or elimination of redundancies) in the input data sequence to obtain speaker templates during the training and verification phases. The normalization step consists of noise reduction, signal amplitude level control, and time warping to reduce the effect of different speaking rates. This is followed by the parameterization step to reduce the amount of data with minimal information loss about the speaker characteristics. An optional feature extraction step can be used to further reduce the data. Next the test template is compared to the reference template. The accept/reject decision is usually based on the computation of a distance function which quantifies the degree of dissimilarity between the test template and the reference template [l] . If the distance exceeds a threshold, the system rejects the match. Comparing the test and the training templates in the verification phase is much simpler if the underlying texts of the utterances are the same. Normally, this text-dependent mode is possible only for cooperative speakers. In forensic work, for example, speakers are often uncooperative, and the test and the training texts are often not the same. This mode is called textindependent speaker verification and the required information stored in the templates is different in this case. In general, the templates contain long-term statistical data. Error rates for text-independent recognition are considerably higher than the rates for a comparable text-dependent case. In this paper, we investigate text-independent speaker verification. A speaker verification system produces two types of errors. Type I error is caused when a true speaker is rejected as being an imposter. Type II error results when an imposter is accepted by the system as the correct speaker. Naturally, the objective in a verification task is to minimize both errors. Previous work [l, 21 on automatic text-independent speaker verification suggests that the important features for speaker discrimination are the spectral envelope parameters. Generally, in speaker verification systems, the reference speaker templates are obtained by averaging short-time spectral parameters over the complete speech utterance. In other words, an average vocal-tract shape is assumed for the duration of the utterance. However, this does not hold in practice since it is well known that different sounds are produced by vocal-tract shapes that vary widely

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker verification based on phonetic decision making

Speaker verification based on phone modelling is examined in this paper. Phone modelling is attractive, because different phonemes have different levels of usefulness for speaker recognition, and because phone modelling essentially makes a speaker verification algorithm text independent. The speaker verification system used here is based on a two stage approach, where speech recognition (segmen...

متن کامل

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances

We investigate how to improve the performance of DNN ivector based speaker verification for short, text-constrained test utterances, e.g. connected digit strings. A text-constrained verification, due to its smaller, limited vocabulary, can deliver better performance than a text-independent one for a short utterance. We study the problem with “phonetically aware” Deep Neural Net (DNN) in its cap...

متن کامل

Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification

This paper examines the usefulness of a multilingual broad syllable-based framework for text-independent speaker verification. Syllabic segmentation is used in order to obtain a convenient unit for constrained and more detailed model generation. Gaussian mixture models are chosen as a suitable modelling paradigm for initial testing of the framework. Promising results are presented for the NIST ...

متن کامل

Impact of frame rate on automatic speech-text alignment for corpus-based phonetic studies

Phonetic segmentation is the basis for many phonetic and linguistic studies. As manual segmentation is a lengthy and tedious task, automatic procedures have been developed over the years. They rely on acoustic Hidden Markov Models. Many studies have been conducted, and refinements developed for corpus based speech synthesis, where the technology is mainly used in a speaker-dependent context and...

متن کامل

High performance text-independent speaker recognition system based on voiced/unvoiced segmentation and multiple neural nets

This paper presents a text-independent speaker recognition system based on the voiced segments of the speech signal. The proposed system uses feedforward MLP classification with only a limited amount of training and testing data and gives a comparatively high accuracy. The techniques employed are: the Rasta-PLP speech analysis for parameter estimation, a feedforward MLP for voiced/unvoiced segm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Digital Signal Processing

دوره 2  شماره 

صفحات  -

تاریخ انتشار 1992